Intonation issues in HMM-based speech synthesis for Vietnamese

نویسندگان

  • Thi Thu Trang Nguyen
  • Do Dat Tran
  • Albert Rilliard
  • Christophe d'Alessandro
  • Thi Ngoc Yen Pham
چکیده

In an HMM-based Text-To-Speech system, contextual features, including phonetic and prosodic factors have a significant influence to the spectrum, F0 and duration of the synthetic voice. This paper proposes prosodic features aiming at improving the naturalness of an HMM-based TTS system (VTed) for a tonal language, Vietnamese. The ToBI (Tones and Break Indices) features are used to learn two crucial prosodic cues i.e. intonation (boundary tones) and pause (break indices), concurrently with another set of features. The result of MOS test showed that the general quality of synthetic voice is rather good, 1.21 point lower than the natural voice. About 55% of the voice trained with ToBI boundary tone feature are perceived as similar to the voice trained without this feature, while a 10% difference in favour of the voice trained without this ToBI feature is observed. This may be linked with F0 contour lowering or raising regardless of lexical tones. This brought two main problems in the synthetic voice: discontinuity in spectrum and F0 or unexpected voice quality. This paper then concluded the need of much more work on intonation modeling that should take into account the Vietnamese tones. A new prosody model can be designed, which may consider the ToBI model, with respect to lexical tones and the syntactic structure of Vietnamese. Index Terms — Text-to-speech (TTS), speech synthesis, tonal language, Vietnamese, HMM-based speech synthesis, intonation, ToBI

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vietnamese HMM-based speech synthesis with prosody information

Generating natural-sounding synthetic voice is an aim of all text to speech system. To meet the goal, many prosody features have been used in full-context labels of an HMMbased Vietnamese synthesizer. In the prosody specification, POS and Intonation information are considered not as important as positional information. The paper investigates the impact of POS and Intonation tagging on the natur...

متن کامل

Improvement of prosodic characteristic in Vietnamese speech synthesis system base on HMM

The key factors helping people to understand the synthesized voices of text-to-speech system are the naturalness and the intelligibility. However, making more natural voices remains a difficult task because of the speech data’s scarcity. With data limited corpus, prosodic information such as tone, intonation, Part-of-Speech is added to ensure the quality of synthetic speech. In the paper, we in...

متن کامل

HMM-based TTS for hanoi vietnamese: issues in design and evaluation

This paper presents the development and evaluation of an HMM-based TTS system for the modern Hanoi dialect of Northern Vietnamese, a tonal language. A study of specific phonetic and prosodic features of Hanoi Vietnamese is discussed. Consequences on the design of an HMM-based TTS system are derived. Using this knowledge, a TTS system, called VTed, is then developed under the Mary TTS platform. ...

متن کامل

HMM-Based Speech Synthesis for the Greek Language

The success and the dominance of Hidden Markov Models (HMM) in the field of speech recognition, tends to extend also in the area of speech synthesis, since HMM provide a generalized statistical framework for efficient parametric speech modeling and generation. In this work, we describe the adaption, the implementation and the evaluation of the HMM speech synthesis framework for the case of the ...

متن کامل

Generating intonation from a mixed CART-HMM model for speech synthesis

This paper proposes two algorithms for generating intonation from a mixed CART-HMM intonation model for speech synthesis. Based either on a Viterbi search or on the Expectation-Maximization algorithm, the two generation algorithms are analyzed in terms of likelihood and F0 Root Mean Square Error. Listening tests are performed to subjectively evaluate the quality of the generated intonation.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014